Education Working Paper Archive 



1 



Education Working Paper Archive 



The Effect of Black Peers on Black Test Scores 



September 12, 2007 



David J. Armor 

School of Public Policy 
George Mason University 

Stephanie Duck 

School of Public Policy 
George Mason University 



September 12, 2007 



1 



Education Working Paper Archive 
Abstract 



2 



Recent studies have used increasingly complex methodologies to estimate the effect of peer characteristics — race, 
poverty, and ability — on student achievement. A paper by Hanushek, Kain, and Rivkin using Texas state testing 
data has received particularly wide attention because it found a large negative effect of school percent black on 
black math achievement. This paper replicates the HKR models using state testing data from North and South 
Carolina and national testing data from the Early Childhood Longitudinal Study. The replications fail to support 
the Texas results. In most models tested, black peer effects are small and not statistically significant, and in the 
few instances where effects are significant, they are much weaker than those found in Texas. Moreover, it 
appears that computational problems in the HKR study led to incorrect estimates for black peer effects. 



September 12, 2007 



2 




Education Working Paper Archive 

THE EFFECT OF BLACK PEERS ON BLACK TEST SCORES 12 



3 



There has been a long-standing debate among social scientists about the effects of school desegregation 
on the academic achievement of minority students. While desegregation plans have many components involving 
students, staff, and programs, all of which have potential impacts on learning, the most contentious issue has been 
whether school or classroom racial composition (or racial diversity) improves achievement. 

Indeed, the origin of the controversy dates to the original Brown decision, which cited social science 
evidence showing that psychological harms from segregation might cause black students to do poorly in school. 
Although most legal scholars reject the notion that Brown relied on the psychological harm thesis, the citations — 
listed in the famous Footnote 1 1 — sparked extensive research and writing on this question over the past five 
decades. In fact, the issue is still very much alive in school policy as well as the federal courts. During the 
2006-07 term the Supreme Court reviewed two school desegregation cases from Seattle and Louisville where 
defendant school boards justified their continued reliance on race in student assignment because of educational 
benefits for minority students (Jefferson County Board of Education, 2006; Seattle School District No.l, 2006). 

Expert studies and testimony in the Seattle and Louisville cases cited a study of black peer effects by 

■y 

Hanushek, Kain, and Rivkin (HKR). Using a massive set of longitudinal test scores from the Texas state testing 
program, HKR developed special models for panel data and found a sizeable negative effect of school percent 
black on black achievement. This was surprising, given many prior studies over many years that had found 
weak relationships at best (St. John, 1975; Cook, 1984; Schofield, 1995; Armor, 2002). As the authors state, 
however, the Texas state achievement data allowed development and application of sophisticated fixed effect 
models that cannot be tested on smaller data sets. 

Accordingly, the impetus for present study is to replicate the HKR model using other longitudinal 
achievement data to determine whether the large black peer effect found by HKR generalizes beyond the state of 
Texas. Because of the federal No Child Left Behind Act, the Texas data are no longer unique; state 
accountability systems in education have generated large-scale data bases of test scores in many states, and some 
states have allowed researchers access to that data. In addition, the Department of Education has developed 
several longitudinal achievement studies, one of which is the Early Childhood Longitudinal Study (ECLS). 

These single-cohort longitudinal studies also allow estimation of HKR-type panel models. 



1 Portions of this paper were presented at the Fall APPAM Research Conference in Madison, Wisconsin, November 4, 2006, and also 
appeared in the Education Working Paper Archives at the University of Arkansas. 

2 Special thanks to Brian Bucks for comments on earlier versions of this paper. 

3 Several versions of this paper have been circulated. See Hanushek, Eric A., John F. Kain, and Steven G. Rivkin, “New Evidence 
about Brown v. Board of Education: The Complex Effects of School Racial Composition on Achievement,” National Bureau of 
Economic Research, Working Paper 8741, revised in February 2004. The first version of this paper was published in 2002, and a more 
recent revision is dated March 2006. We will rely on the 2004 and 2006 versions in this paper 

September 12, 2007 



3 




Education Working Paper Archive 



4 



This paper replicates the HKR models using data from two state testing programs, North and South 
Carolina, both of which have data quite similar to the Texas data. It also tests the HKR model using the ECLS 
data. 

Theories of Black Peer Effects 

Although the main thrust of this paper is empirical, it might be useful to review the major theoretical 
perspectives behind the study of black peer effects. Three major theories attempt to explain why school 
segregation, or schools that are predominantly black, have adverse effects on black achievement and, conversely 
why desegregated schools should improve black academic achievement. The first is self-esteem theory, a second 
is educational inputs theory, and a third is peer group theory. A fourth theory attempts to explain why school 
composition should have limited effects on achievement. 

Self-esteem theory postulates that school segregation creates a “stigma” for black children which affects 
their self-image and motivation to succeed. This is the theory behind the “psychological harm” thesis in the 
original Brown v Board of Education decision. 4 This theory was supported by some early social science research, 
in particular the famous “doll” studies of Kenneth and Mamie Clark (Clark, 1939). However, most research 
conducted after 1960 found little evidence for the theory, and indeed numerous self-esteem studies found that 
black children had higher self-esteem than white children and also that black children in segregated schools had 
higher self-esteem than black children in desegregated schools (Annor, 1995; Schofield, 1995). 

The second theory is based on educational input-output models drawn from economics. In this case, 
input-output theory suggests that greater school inputs, such as more funds, higher teacher quality, and reduced 
class sizes will produce higher outputs in the fonn of student achievement. During the 1950’s and 60’s, many 
social scientists and educators believed that school resources were deficient in segregated black schools, thereby 
explaining the lower achievement of black students. Since desegregation would put black and white children in 
the same schools, school resources would be more equitable and therefore the achievement gap would diminish. 
The controversial Coleman Report challenged this assumption, finding that by the mid-1960s school resources 
were not that different between predominantly black and white schools, and, moreover, that school resources 
were only weakly related to student achievement (Coleman, et al, 1966). Although the validity of Coleman’s 
conclusions were upheld in subsequent analyses of his data (Jencks, 1972; Mosteller & Moynihan, 1972), the 
issue continues to be debated among educational researchers (Hanushek, 1996; Hedges, Laine, & Greenwald 
1994; Grissmer & Williamson, 1998; Rothstein, 1995). The current debate is not so much on whether school 

4 “To separate [black children] from others of similar age and qualifications solely because of their race generates a feeling of inferiority 
as to their status in the community that may affect their hears and minds in a way unlikely to ever be undone. . .” Brown v. Board of 
Education, 347 U.S. 483 (1954). 

September 12, 2007 



4 




Education Working Paper Archive 5 

resources have an effect on academic achievement, but rather how large the effect is and whether equitable 
resources alone can solve the problem of low black achievement. 

Perhaps the most relevant theory for this study is peer group theory itself. Classical peer group theory 
postulates that schools with a higher proportion of higher achieving middle-class students will detennine the 
teaching standards of schools and classrooms, thereby improving the perfonnance of lower achieving students 
(Armor, 2006). Although classical peer group theory does not require racial desegregation per se, just the 
presence of a sufficient number of middle class students, as a practical matter the high correlation between race 
and socioeconomic status implies that SES integration cannot be accomplished without racial integration. 

A different type of peer group theory is black oppositional culture (Fordham and Ogbu, 1986). Their 
argument suggests that black students face significant pressure from their black peers to reject academic success 
because such success equates to giving up African-American culture and ‘acting white.’ Therefore, a 
desegregated school can reduce this negative effect by replacing black peers with white peers. While this theory 
has been embraced by some educators (e.g., Rothstein 2004), oppositional culture theory has not been supported 
by several empirical studies (Cook & Ludwig 1998; Ainsworth-Darnell & Downy 1998). 

The fourth theory explains why desegregated schools or reduction in black peers may not improve black 
achievement. Family background theory posits that academic achievement is determined primarily by a series of 
parent characteristics, and therefore school characteristics — whether racial composition or school resources — has 
minimal impact on black achievement. These family factors, including parent IQ, parent education and income, 
family structure and size, birth weight and nutrition, and parenting behavior such as cognitive stimulation operate 
during infancy and explains the school readiness gap, but they continue to operate throughout the early school 
years and can explain a great deal of the achievement gap between black and white children (Brooks-Gunn, 
Klebanov, and Duncan, 1995; Annor, 2003; Brooks-Gunn and Markman, 2005). When comparing black 
students in segregated and desegregated schools, this theory underscores the need to control for as many family 
background characteristics as possible, since blacks in segregated schools may come from more disadvantaged 
families than blacks in desegregated schools. 

Models for Assessing Peer Effects 

Since the early versions of the HKR paper were circulated, a number of studies have appeared with a 
variety of different models for analyzing peer effects. Most of these newer models estimate peer effects at the 
classroom level, and they also place substantial emphasis on estimating the effect of peer achievement levels 
along with peer race. Indeed, it may well be that the reason for a negative black peer effect is not because of race 
but because black students on average have lower achievement than white students. Thus most studies of peer 
effects, including HKR, include average peer achievement along with peer racial composition in their models. 
September 12, 2007 5 




Education Working Paper Archive 



6 



To date, other studies of peer effects have used either the North Carolina or the Florida state testing data. 
Using North Carolina data, Vigdor and Nechyba (2004) estimated classroom- and school-level peer effects for 
race and achievement levels; they find a classroom-level effect for peer achievement, but they do not find 
consistent effects for racial composition. Another study of achievement in Wake County, North Carolina, 
developed nonlinear models for peer effects at the classroom level (Hoxby and Weingarth, 2005); this study also 
finds weak effects for racial composition at the classroom level when peer achievement is properly accounted for. 
Finally, another study using North Carolina data finds a small effect of desegregation on closing the black-white 
gap, although only by .06 standard deviations (Cooley, 2006). A study of Florida state testing data also 
estimated peer effects at the classroom level, and it found “no effects of percent black students after controlling 
for student, school, and teacher effects” (Burke and Sass, 2004). 

Thus other studies using North Carolina and Florida state testing data have failed to confirm a large 
negative effect arising from concentrations of black students, thereby rendering the HKR study even more 
unique. However, none of these studies are true replications because they all impose substantially different 
models and assumptions than those adopted by HKR. In particular, they all estimate peer effects at the classroom 
level rather than the school and grade level used by HKR. Because the Texas state test data do not include 
classroom identification, HKR have not examined peer effects at the classroom level. Since this study is 
primarily a replication of the HKR study of Texas, we confine our estimation of peer effects at the school and 
grade level rather than at the classroom level. 

Moreover, we believe that the HKR approach is more relevant to the issue of school desegregation policy. 
Both courts and legislative bodies have generally distinguished between racial composition at school and 
classroom levels, and most school desegregation policies and court rulings focus on school rather than classroom 
composition. While absence of a black peer effect in desegregated schools might be caused by resegregation at 
the classroom level, the HKR study found a large black peer effect at the school and grade level, thereby 
rendering classroom racial composition moot for Texas. Of course, if the HKR result is not replicated in other 
states, then it would be appropriate to investigate classroom resegregation as a possible explanation for lack of a 
black peer effect at the school level. 

The original HKR paper formulated a longitudinal model of achievement which postulated current 
achievement as a function of current student, school, and peer characteristics as well as the prior year’s 
achievement, as represented by equations (1) and (2) below (HKR, 2002, 2004). For child i in school s and 
grade g, A lsg is achievement, X is a vector of individual student characteristics (e.g., family socioeconomic 
status), S is school and teacher characteristics, P is average peer characteristics, and e is an error term. 

Ais(g-l) — Xi s (g-1)B + S s (g-l)C + Ps(g-l)F) + Aj s (g_2) + Uis(g-l) ( 1 ) 

September 12, 2007 



6 




Education Working Paper Archive 



7 



A- 



lSg 



- Xi S gB 



+ S sg C 



+ P S gD 



+ A, 



is(g-l) 



+ Uj 



isg 



( 2 ) 



By taking the first differences of the achievement equations for two consecutive years, HKR arrive at a gain- 
score (or value-added) model described in equation (3) below. 

Aisg- Ais(g-i) = AAi sg = X; sg B + S sg C +P sg D + u lsg (3) Gain model 

The residual term u is further decomposed according to several fixed effect terms which we will discuss later. In 
their most recent revision, HKR propose an additional lagged achievement model which becomes their preferred 
approach (HKR, 2006). In their new formulation, the coefficient for the prior year achievement Ai S ( g _i) is not 
assumed to be 1 , so that we have instead 

Ai S g = X isg B* +S sg C* + P sg D* + 0A is( g_i) +u isg (4) Lagged model 



Coefficients for student (B and B*), school (C and C*), and peer effects (D and D*) are assumed to be 
constant from one grade or year to the next (e.g., parent SES exerts the same effect on annual achievement gains 
in grade 3 as in grade 4, etc.) in this model. However, student, school, and peer characteristics themselves can 
(and do) change from year to year, which would give rise to changes in achievement according to the model. It 
should be noted that the Texas analysis includes three cohorts of students, and our North and South Carolina 
analysis uses four or five cohorts. 5 Their equations show a superscript c for cohort which is omitted here for ease 
of notation. 

The extent to which coefficients B, C, and D differ between (3) and (4) will depend on the size of 9. If 0 
is close to 1, which is often the case with achievement test scores measured one year apart, then the coefficients 
may not differ appreciably. Peer effects are distinguished for average racial composition (% black, % Hispanic), 
average poverty level, and average achievement level of a given grade within a school. 

In addition to this basic functional fonn, HKR attempt to reduce error variance by controlling for several 
types of fixed effects. The Texas data has student test scores over several grades (4 to 7), several years (1993 to 
1997), thousands of schools, and hundreds of thousands of students. By “stacking” gain or lagged test scores, 
where an observation is a single gain score or a single level score and a lagged score within in a given grade, 

5 By cohort, we mean a group of students who start a grade in a given year and then progress through later grades; e.g., one cohort 
would be students who start 3 rd grade in 2001; a second cohort would be students who start 3 rd grade in 2002. 

September 12, 2007 



7 




Education Working Paper Archive 8 

school, and cohort, they can estimate fixed effects for various combinations of students, grades, schools, 
attendance zones, and years or cohorts. The analytic advantage of fixed effect models is removing unmeasured 
differences among students, schools, and attendance zones that are constant over the period of study. 

Specifically, four fixed effect combinations are estimated in most of their 2004 and 2006 models: student, year by 
grade, school by grade, and attendance zone by year. 

fn a more recent but quite different paper on peer effects, Hanushek and Rivkin have changed to 
aggregate models whereby scores on all individual student measures (test scores, socioeconomic status, etc.) are 
averaged for each school by grade by year group, separately for each race, and then the coefficients B, C, D and 
9 are estimated using the aggregated form of the data (Hanushek and Rivkin, 2007). The reason for shifting to an 
aggregate analysis has to do with computational problems when trying to estimate both student and school by 
grade fixed effects simultaneously, an issue discussed in the next section. 

fn switching to an aggregate model, Hanushek and Rivkin have dropped estimation of student fixed 
effects. However, since all of the other fixed effect factors (year by grade, school by grade, attendance zone by 
year) can be estimated with the individual student data, the advantages offered by aggregation is unclear. 
Aggregation has several analytic drawbacks, not the least of which is a problem caused by students who repeat 
one or more grades over time. 6 Moreover, retaining the individual student data allows estimation of student fixed 
effects, albeit not simultaneously with school by grade fixed effects. 

Review of the Texas Results 

A summary of key results for the original models estimated by HKR is provided in Table 1. The figures 
are adapted from Table 1 of their 2004 and 2006 versions of the paper, as indicated. The coefficients in their 
papers are scaled to reflect the proportion of black peers. For ease of interpretation and also comparison with our 
replication results, we have changed the scale of the coefficients in Table 1 to reflect a 10% change in percent 
black rather than a unit change in the proportion black (in the HKR papers all test scores are standardized to mean 
0 and variance 1 for each grade and year, and black peers are measured as a proportion). Thus a coefficients 
should be read as the change in achievement, in sd’s, for a 10 percentage point increase in black peers . 7 



6 Students who repeat grades — about 5% of black students and 2% of white students — have to be eliminated when aggregating by year- 
grade-school groups, because they create extra year-school-grade groups with very low test scores. But they can be included and 
identified in the individual student analysis. 

7 It is unlikely that a black student would experience a 100-point change in percentage of black peers in one year, even in the most 
comprehensive desegregation plan. In Southern school districts during the early 1970s, a change of more than 60 or 70 percentage 
points was rare. 

September 12, 2007 



8 




Education Working Paper Archive 



9 



Table 1 Effect of School Percent Black on Black Math Achievement in Texas (HKR papers) 



Dep. Variable 


Model 


Effect 3 Siq. 


Fixed effect and other controls 15 


2004 Paper 


Gain scores 


(1) 


-.007 ** 


None 


Gain scores 


(2) 


+.029 *** 


Student fixed effects 


Gain scores 


( 3 ) 


-.030 ** 


Student & school x grade fixed effects 


Gain scores 


( 4 ) 


-.031 ** 


Student, school x grade, & att. zones x year fixed effects 


2006 Paper 


Lagged scores 


( 5 ) 


-.009 ** 


None 


Lagged scores 


(6) 


-.021 


Student & school x grade fixed effects (from Table 2) 


Lagged scores 


( 7 ) 


-.023 ** 


School x grade & att. zones fixed effects 


Lagged scores 


(8) 


-.022 ** 


Fixed effects from model (3) plus school & teacher characteristics 


Lagged scores 


( 9 ) 


-.024 ** 


All controls from model (4) plus average peer achievement (twice lagged) 



Sources: Hanushek, Kain, and Rivkin, 2004 & 2006, Table 1. * p<.05 ** p<.01 *** p<.001 

3 Expected change in math or gain scores (in standard deviations), given a 10% increase in percentage black students in a 
given year. 

b In addition to the controls listed, all models include a full set of grade-by-year dummies, indicators for school changes 
(other than to middle school), and indicators for free lunch eligibility. 



There are several surprising features in these results. First, the gain score models show unusual and very 
large reversals in the signs of the coefficients as fixed effect controls are added. Comparing gain models (1) to 
(3), the coefficient changes from a modest negative effect (-.007) without controls to a much stronger positive 
effect (+.029) after controlling for student fixed effects, then it reverses to a strong negative effect (-.030) after 
adding school by grade fixed effects. Note that adding attendance zones does not have an appreciable effect in 
the gain score models. 

The coefficients for the lagged achievement model do not show such large swings; in fact, considering the 
gain models, they are somewhat surprising for their lack of change. For example, there is little difference 
between removing student and school by grade only (-.021) and removing school by grade and attendance zone 
by grade only (-.023). There is also very little change after removing average peer achievement (-.024). This 
lack of a peer achievement effect is inconsistent with most of the research on this issue, as reported above, where 
addition of peer achievement usually reduces the black peer effect to a significant degree. Note, also, that 
addition of school and teacher controls has very little effect (-.022). 

The size of the effect in these models is also surprising. An effect of -.030 implies that a 50 point 
reduction in school percent black — not uncommon in a comprehensive desegregation plan — would raise black 
math scores by . 1 5 standard deviations in a single year. In their discussion of impact, HKR suggest this effect is 
cumulative over grades, in which case the impact would be multiplied by the number of grades or years that the 
desegregation lasted. If this reduction lasted five years, and assuming that white achievement does not change, 

September 12, 2007 9 




